A standardized database of Chinese emotional short videos based on age and gender differences

Most of the existing emotion elicitation databases use the film clips as stimuli and do not take into account the age and gender differences of participants. Considering the short videos have the advantages of short in time, easy to understand and strong emotional appeal, we choose them to construct a standardized database of Chinese emotional short videos by the joint analysis of age and gender differences. Two experiments are performed to establish and validate our database. In the Experiment 1, we selected 240 stimuli from 2700 short videos and analyzed the subjective evaluation results of 360 participants with different ages and genders. As a result, a total of 54 short videos with three categories of emotions were picked out for 6 groups of participants, including the male and female respectively aged in 20-24, 25-29 and 30-34. In the Experiment 2, we recorded the EEG signals and subjective experience scores of 81 participants while watching different video stimuli. Both the results of EEG emotion recognition and subjective evaluation indicate that our database of 54 short videos can achieve better emotion elicitation effects compared with film clips. Furthermore, the targeted delivery of specific short videos has also been verified to be effective, helping the researchers choose appropriate emotional elicitation stimuli for different participants and promoting the study of individual differences in emotion responses.


Introduction
Emotion has always been a hot research topic in the fields of psychology and artificial intelligence [1][2][3]. As an essential step in affective computing, emotion elicitation has also drawn increasing attention [4]. Therefore, creating the effective emotion elicitation databases is becoming a popular topic for researchers interested in emotion.
Currently, the researchers have attempted a range of methods of eliciting emotion in the laboratory, such as interactive training, hypnosis, pictures, music, slides and film clips [5][6][7]. Compared to other emotional stimuli, the film clips (i.e., a portion of a full-length film) exhibit several advantages in emotion eliciting tasks. Firstly, the film clips are dynamic stimuli with both auditory and visual channels, which can highly attract attention [8]. Secondly, they have relatively high ecological validity and can induce strong subjective experiences and physiological changes [5]. Thirdly, the film clips present continuous emotional scenes and are able to capture the emotions that develop over time [9]. Furthermore, the meta-analysis of emotion elicitation has also validated that film clips are one of the most effective ways to elicit emotion [10]. Generally, the selection of film clips for emotion eliciting tasks should follow three criteria [5]: relatively short duration, comprehensible without additional explanation, and inducing a specific target emotion. For the construction of emotional film databases, a lot of efforts have been made over the years. According to the emotion model [11], the existing emotional film databases can be divided into two categories, the dimensional model-based and the discrete model-based. The dimensional model believes that emotions are characterized by combinations of dimensions, such as valence, arousal, and dominance [12]. Baveye et al. [13] adopted a crowdsourcing approach by asking annotators to rate the degree of valence and arousal, and built a large free-shared database of 9,800 film clips. Zheng et al. [14] edited 15 film clips from 6 films, and divided their emotions into three categories (positive, neutral and negative) along the valence dimension. Ismail et al. [15] displayed 24 videos to 42 participants by an online survey, and the rating results of valence-arousal dimensions indicated that 79 percent of the videos could successfully elicit the target emotion. While the discrete model suggests that emotions can be classified into basic categories, and the first discrete emotional film database was developed in [16] to elicit six basic emotions proposed by Ekman [17]. Thereafter, Gross and Levenson [5] creatively proposed the concept of success index to quantify the effect of film clips on eliciting emotions and constructed a film database that can induce eight emotions. To provide more immersive experience, Jeong et al. [18] built a database of 4D films with the aid of chair movements, vibrations, winds and scents. It is worth noting that the participants in different cultures and languages will have different responses to the same emotional stimulus [19][20][21]. Therefore, the researchers have also created different databases for different cultures and languages. Michelini et al. [22] designed a film database for Latin-Americans, where the emotional states can be analyzed from both dimensional and discrete perspectives. Shalchizadeh et al. [23] recorded the emotional responses of 88 participants by means of the online website, and established a database of 21 film clips for Persian culture. Another film database was established for Asians in [24] that can induce eight emotions with each emotion type including 8 film clips. According to the self-assessment results of 50 college students on 30 Chinese film clips, 18 film clips were selected to form a standardized emotional film database in [25]. To further explore the structure of positive emotions, Zhang et al. [26] designed a database of 22 film clips containing four positive emotions (empathy, fun, creativity and esteem).
As a matter of fact, the individual difference of participants is an unavoidable influencing factor in the study of emotion elicitation. The emotional responses of different participants may vary greatly in the same elicitation situation. Numerous studies have shown that there are age-related differences in emotional responses [27][28][29]. Mather et al. [27] identified that for older adults (70-90 years), seeing positive pictures lead to stronger amygdala activation than negative ones, whereas this phenomenon was not found in younger adults (18-29 years). Buriss et al. [28] found that the older (66-95 years) and middle-aged (36-65 years) adults have higher valence in response to pleasant slides and lower valence to unpleasant ones compared to younger adults (18-35 years). Jenkins et al. [29] developed a database of verbal and non-verbal contemporary films for a large age range of participants, and concluded that regardless of watching positive or negative film clips, the participants aged 46-88 have stronger emotional responses than those aged 18-45. Similarly to age difference, there has also been a great deal of exploration of gender difference in emotional responses [30][31][32]. Bradley et al. [30] measured the emotional reactivity of males and females when viewing emotional pictures, and suggested that the female shows greater defensive responses to aversive pictures, whereas the male shows stronger appetite activation only when viewing pornography. The difference in sensitivity to negative stimuli was studied between males and females in [31], and the experimental results showed that the female has greater sensitivity to negative stimuli. Deng et al. [32] explored gender differences in terms of both emotional experience and emotional expression, and concluded that the male typically has stronger emotional experiences, while the female is better at expressing emotions, especially negative emotions. From the above analysis, it can be concluded that although the researchers have designed various experiments exploring the age and gender differences in emotional responses, the issue of individual differences has not been considered in the construction of emotion elicitation databases. Therefore, it is necessary to pay attention to the age, gender and cultural background of participants and provide specific emotion-evoking stimuli for different participants.
On the other hand, how to evaluate the quality of an emotion elicitation database? The most common method is to ask participants to rate the elicited material according to their feelings during the emotion elicitation phase. Unfortunately, it is difficult to describe emotions accurately and quantitatively by relying only on subjective assessment. It has been shown that the occurrence of emotions is usually accompanied by changes in physiological signals, such as galvanic skin response (GSR), heart rate (HR), electrocardiogram (ECG) and electroencephalography (EEG) signals [33]. Compared with other physiological signals, EEG can directly record the changes of scalp potentials and reflect the emotions more objectively and accurately, and thus it has been widely used in the field of affective computing [34][35][36]. Koelstra et al. [37] used 32 active AgCl electrodes to capture the EEG signals of participants while watching music videos, and a significant correlation was found between participant ratings and EEG frequencies. A high-density EEG signal database was built by using a 128-channel EEG device in [38], and a feature selection method was also presented for emotion recognition. In recent years, several consumer-grade EEG devices have also been developed with the advantages of low cost, good portability and reliability, such as Emotiv, OpenBCI and NeuroSky [39]. Liu et al. [40] used the Emotiv EPOC with 14 electrodes to record the EEG signals of participants while watching film clips. Stylianos et al. [41] developed an OpenBCI-based software for detecting, displaying and analyzing EEG signals. Therefore, considering EEG signals can reflect the emotions more authentically, the effect of emotion elicitation by stimuli database can be evaluated through the accuracy of EEG emotion recognition.
Taking into account the individual difference, the main purpose of this paper is to build a standardized Chinese emotion elicitation database for participants of different ages and genders. Considering the advantages of short duration, easy understanding and strong emotional appeal, we choose the short videos as emotion-evoking stimuli. For constructing the database of Chinese emotional short videos, we develop three hypotheses: 1) the individual differences in age and gender will lead to differences in emotional responses when watching emotional short videos; 2) compared to film clips, the stimuli of short videos will bring better emotion elicitation effect; 3) the specific short videos designed for participants of different ages and genders contribute to better emotion elicitation effect than those do not account for individual differences. Compared with the existing emotion elicitation databases, our database has the following characteristics: 1) it is the first attempt to use short videos as emotion-evoking materials; 2) the age and gender differences of participants are considered jointly for the first time to provide specific short videos for different groups; 3) the subjective and physiological responses of participants are recorded simultaneously, and the EEG signals are used for emotion recognition to evaluate the quality of our emotion elicitation database.

Experiment 1
The purpose of Experiment 1 is to explore whether there are age and gender differences in emotional responses when watching short videos, and pick out the short videos for constructing our emotion elicitation database that are more likely to elicit emotion for different groups.

Methods
Ethics. The experimental procedures were approved by the Institutional Review Board of State Key Laboratory of Media Convergence and Communication of Communication University of China (CUCE-2022-017). All participants voluntarily agreed to participate in this study and signed the written informed consent to have data from their records used in research and publication of these case details. All data have been fully anonymized before we accessed them, and the database can be used only for academic research upon request for approval.
Materials. In recent years, with the rapid development of mobile Internet technology, the short video industry has been rising rapidly. In fact, a large number of short video databases have been proposed and used for different tasks, such as target detection [42], behavior recognition [43] and person identification [44]. However, to the best of our knowledge, the short videos have not been used for the construction of emotion elicitation databases so far. We believe that the short videos may be more suitable for emotion elicitation compared to film clips. Firstly, as the name suggests, the short videos are short in length and can be directly used as emotion elicitation materials without secondary editing. Secondly, the short videos are carefully edited, compact in plot and easy to understand. Thirdly, the short videos have specific themes and can bring strong emotional impact in a short time.
The short videos used in this experiment were obtained as follows. The 30 research assistants aged in 20-24, 25-29 and 30-34 were trained in the definition and subjective assessment of different emotions, with each age group including 5 males and 5 females. They were asked to select short videos that might elicit eight categories of emotions according to [45], i.e., four negative emotions (disgust, anger, fear and sadness), neutrality and three positive emotions (tenderness, amusement and joy). We specified the following criteria for the selection of short videos: 1) 60-240 seconds in duration; 2) easy to understand without additional explanation; 3) eliciting a specific emotion; 4) Chinese cultural and language; and 5) horizontal screen. After the subjective selection by assistants, the 2,700 short videos (duration: 60-236 seconds, mean = 148.69 seconds) were selected and labeled with the emotional categories. audio tracks. Then, these short videos were viewed independently by four cognitive psychologists aged in 32-45 (2 males and 2 females) with experience in affective assessment. For each short video, the cognitive psychologists subjectively assessed its potential to successfully elicit one and only one of the target emotion categories. Only the short videos that were unanimously satisfactory to all four cognitive psychologists were retained, thus giving 240 short videos (duration: 71-232 seconds, mean = 150.06 seconds) for further study. It is worth noting that the emotional categories of short videos would not be changed after the evaluation of the four cognitive psychologists. The number of short videos for each emotion category ranged from 25 to 35.
Participants. We recruited the volunteers on the campus of Communication University of China by releasing a poster. Considering the participants who are extroverted and emotionally stable are more likely to be induced with the target emotion, we used the Eysenck Personality Questionnaire Short Scale for Chinese (EPQ-RSC) [46] to choose the volunteers. The EPQ-RSC conceptualizes personality into four dimensions: Psychoticism, Extraversion, Neuroticism and Lie. Different tendencies and degrees of expression in these four dimensions constitute different personality traits. A total of 482 volunteers enrolled in this recruitment, and we finally selected 360 participants based on EPQ-RSC. These participants are aged in three groups of 20-24, 25-29 and 30-34, with each age group including 60 males and 60 females. All of them are healthy right-handed, and not majored in psychology. Prior to the formal experiment, all participants were informed of the experimental procedure. Each participant received 120 RMB as compensation.
Measures. 1) Self-assessment 9-point scale. In the subjective evaluation stage, we designed a Self-assessment 9-point scale to evaluate the degrees of valence and arousal of participants when watching short videos. This scale is adapted from the Self-assessment manikin (SAM) [47] that measures the degree of emotional responses in the form of pictures. The scale for this experiment considers the ratings of both valence (from very negative to very positive) and arousal (from very calm to very excited) associated with each short video. Each aspect was assessed on a 9-point Likert scale, meaning that the participants were asked to choose the score from 1 to 9 that best matched their true feelings when watching the short videos. Furthermore in order to avoid understanding bias and obtain more uniform results, we gave each participant an additional guide for their reference which clearly defined the meaning of each score.
2) Emotional evaluation scale. To investigate the intensity of each emotional dimension, we developed an emotional evaluation scale adapted from the Differential Emotions Scale (DES) [16] that measures the differentiation component of emotions. The scale includes three emotional states, i.e., positive, neutral and negative. The participants were asked to score the emotional intensity of each state on a 9-point Likert scale (from not at all to extremely) according to their real feelings when watching short videos. The ratings of this scale were used later to calculate the success index, which is an objective criterion for selecting the effective emotional stimuli.
Procedure. The experiment was conducted in an online format over a period of 6 days. The selected 240 short videos were pseudo-randomly assigned to six subsets and the number of short videos with the same emotion category was at most six in each subset. Correspondingly, the 360 participants were also divided into six groups, and the participants of each group are aged respectively in 20-24, 25-29 and 30-34 with each age set including 10 males and 10 females. For each day, one of six groups of participants attended an online meeting at a fixed time (2:00-5:00 pm) in a bright and quiet environment. The participants were asked to watch one subset of short videos, which were randomized in order. In particular, no more than two short videos of the same emotion category or three short videos of the same valence state, were shown consecutively. In this experiment, we did not let the participants watch these short videos in advance, and the participants also indicated they have not watched these short videos within one month. Before the formal experiment, an experimenter introduced the experimental procedure and scoring criteria of above two scales. Firstly, a 30-second blank screen was presented before playing a video, helping the participants empty their brains of all thoughts, feelings and memories. Next, the participants were asked to watch each short video carefully, they could move their eyes away from the screen if the video content makes them feel strongly uncomfortable. After watching each short video, the participants were encouraged to complete the above two scales based on their immediate true feelings. At the end of the experiment, the experimenter explained that the purpose was to examine the differences in emotional responses among participants of different ages and genders when watching short videos. Finally, each participant watched 40 short videos and each short video was viewed by 60 participants.
Data analysis. The results of each participant were averaged across the short videos for each emotion category. The scores of valence and arousal were examined in two separate three-way mixed analyses of variance (ANOVAs), including age (20-24, 25-29 and 30-34), gender (male and female) and emotion category (disgust, anger, fear, sadness, neutrality, tenderness, amusement and joy). The age and gender were between-subject factors, while the emotion category was a within-subject factor. The level of statistical significance was set at p < 0.05. And the effect sizes were presented as partial eta squared (η 2 ) for ANOVA effects. It has been verified that the valence scores and arousal scores of each emotion category obey the normal distribution, and the multiple pairwise comparisons were implemented by using Bonferroni's correction. All the analyses were performed by using IBM SPSS Statistics 25.0 software.

Results
Valence effects. For each emotion category scored by participants of different ages and genders, the descriptive statistical results on the valence dimension are shown in Table 1. The mean score (M) and standard deviation (SD) were calculated to evaluate the valence degree of participants elicited by short videos.
When processing the three-way mixed ANOVA on the valence dimension, there was a significant main effect of emotion category (F (7,2478) = 2237.96, p < 0.001, η 2 = 0.86). The posthoc test indicated that there were no significant differences among the short videos with three positive emotions (tenderness, amusement and joy), but they were significantly different from those with other emotion categories. All the valence scores of positive short videos were significantly higher than those of neutral ones, and the valence scores of neutral videos were significantly higher than those of negative ones. More specifically, the valence scores of emotion categories differed significantly as follows: sadness, disgust, fear < anger (p < 0.001) < neutrality (p < 0.001) < tenderness, joy, amusement (p < 0.001). The interaction effect of emotion category by age (F (14,2478) = 6.20, p < 0.001, η 2 = 0.03) was significant. Further, the results of simple effect analysis are shown in   Moreover, the interaction effect of emotion category by age by gender (F (14,2478) = 4.26, p < 0.001, η 2 = 0.02) was significant. According to our research interests, the simple-simple effect analysis of age on gender and emotion category was firstly conducted. As shown in    Arousal effects. For each emotion category scored by participants of different ages and genders, the descriptive statistical results on the arousal dimension are shown in Table 2. The values of M and SD were calculated to evaluate the arousal degree of participants elicited by short videos.
When processing the three-way mixed ANOVA on the arousal dimension, there was a significant main effect of emotion category (F (7,2478) = 634.75, p < 0.001, η 2 = 0.64). The post-hoc test indicated that the arousal scores of neutral short videos were significantly lower than those of positive and negative ones (all p < 0.001). Furthermore, the short videos eliciting disgust were rated as higher arousal scores than those inducing anger (M = 6.38, SD = 0.88 versus M = 6.08, SD = 1.22; p = 0.005) and sadness (M = 6.38, SD = 0.88 versus M = 6.14, SD = 0.92; p = 0.010). The short videos eliciting tenderness were rated as higher arousal scores than those inducing anger (M = 6.47, SD = 1.37 versus M = 6.08, SD = 1.22; p < 0.001) and sadness (M = 6.47, SD = 1.37 versus M = 6.14, SD = 0.92; p = 0.005). The interaction effect of emotion category by age (F ( Moreover, the interaction effect of emotion category by age by gender (F (14,2478) = 1.81, p = 0.038, η 2 = 0.10) was significant. According to our research interests, the simple-simple effect analysis of age on gender and emotion category was firstly conducted. As shown in  As for the factor of gender, a significant main effect was also found (F (1,354) = 31.32, p < 0.001, η 2 = 0.08). The post-hoc test indicated that the arousal score of the female were significantly higher than male (M = 5.86, SD = 1.72 versus M = 5.62, SD = 1.73; p < 0.001). Moreover, there were no significant differences presented in the main effect of age (F (2,354) = 1.68, p = 0.188, η 2 = 0.01), as well as the interaction effect of age by gender (F (2,354) = 2.07, p = 0.127, η 2 = 0.01).
Selected short videos. Through the analysis on valence and arousal effects, we found that when watching emotional short videos, the emotional responses of participants did differ by age and gender. Based on this finding, we have reasons to believe that targeting different emotional short videos for participants of different ages and genders will contribute to better emotion elicitation effects. In the experiment, we used an objective criterion of success index [5] to select effective short videos. For each short video, its success index is obtained by summing the z-scores of hit rate and intensity. Therein, the hit rate is measured as the proportion of participants who rated the target emotion at least one point higher than other emotion categories, and the intensity is the mean rating of target emotion [45]. Both the hit rate and intensity can be calculated by the results of the emotional evaluation scale rated by 360 participants.
Considering the existing EEG based emotion recognition algorithms have shown considerable performance in solving the two-category and three-category tasks, in order to further validate our database through EEG signals in the subsequent experiment, here we combined the

PLOS ONE
original eight emotional categories into three emotional states and built a database of Chinese emotional short videos based on age and gender differences containing three valence states, i.e., positive, neutral and negative. Guided by the success index, we totally selected 54 short videos (duration: 91-229s, mean = 160.04s) as emotion elicitation materials for 6 groups of participants, including male and female respectively aged in 20-24, 25-29 and 30-34. Each group of participants corresponded to 9 short videos, specifically involving 3 positive, 3 neutral and 3 negative stimuli. In order to facilitate the use of our emotional short video database, we also developed the naming rules for short video files. The name of a short video file consists of four parts, i.e., "valence (positive, neutral, negative)_gender (1="male", 2= "female")_age (1="20-24 years", 2="25-29 years", 3="30-34 years")_number (1, 2, 3).MP4". For example, the file named "positive_2_3_1.MP4" denotes a short video used to induce positive emotion for females aged 30-34, and its number is 1. Due to the copyright issues, we are not able to provide the actual short videos, but the download links are included in "S1 Video" of the supporting information.

Experiment 2
The purpose of Experiment 2 is to capture the EEG signals and subjective experiences of participants with different ages and genders when watching different emotional stimuli, 1) to verify the reliability of short videos as audio-visual stimulus materials for eliciting emotions compared with film clips; 2) to validate the effectiveness of targeted delivery of specific emotional short videos to different participant groups. Here we chose EEG to validate the selected 54 video clips, since it can directly record the changes in scalp potentials and reflect emotions more objectively and accurately compared with other physiological signals. In this experiment, the emotion recognition results based on EEG can also provide the objective support for subjective experience ratings.

Methods
Ethics. Ethics was the same as Experiment 1. Materials. In the Experiment 1, we totally selected 54 emotional short videos for 6 groups of participants with different ages and genders. For each group, 9 short videos were provided including 3 positive, 3 neutral and 3 negative stimuli. To make a fair comparison, we also selected 9 film clips (duration: 41-114 seconds, mean = 67.78 seconds) from the Chinese film database developed in [45], which includes 22 film clips (duration: 41-166 seconds, mean = 82.50 seconds) for eliciting eight categories of emotions, i.e., four negative emotions (disgust, anger, fear and sadness), neutrality and three positive emotions (tenderness, amusement and joy). More specifically, three positive (Singing When We Are Young, Just Another Pandora's Box, Hear Me), three neutral (Raise the Red Lantern, Black Coal Thin Ice, Space Millennium) and three negative film clips (City of Life and Death, Bodyguards and Assassins, The Chrysalis) were picked out according to the rank of success index. Similar to the emotional short videos, the 9 film clips were renamed in the format of "valence (positive, neutral, negative)_film_number (1, 2, 3).MP4". For example, the file named "positive_film_1.MP4" stands for the film clip "Singing When We Are Young".
Participants. Since we needed to collect the EEG signals of participants, two additional conditions were added besides the participant recruitment requirements given in Experiment 1: no head wound and no hair dye or perm within one month. To ensure the independence of Experiment 1 and Experiment 2, the participants recruited in this experiment was different with those in Experiment 1. We totally received 412 applications, of which 100 volunteers were selected based on the evaluation results of EPQ-RSC and recruitment requirements. Before the experiment, the participants were reminded to get quality sleep, wash hair in advance, and avoid the intake of stimulating foods (e.g., tobacco, alcohol, coffee). As in Experiment 1, each participant was informed of the experimental procedure and signed an informed consent form. During the acquisition of EEG signals, we further excluded 19 participants because they could not adjust to a relaxed state. The final samples were composed of 81 participants (age range 20-24: 36 males and 20 females; 25-29: 8 males and 7 females; 30-34: 5 males and 5 females). Each participant received 150 RMB as payment after completing the whole experiment.
Measures. The measures were the same as those taken in Experiment 1. Specifically for the integrity of subjective assessment, three dimensions were added to the Self-assessment 9-point scale in this experiment. They are namely dominance (from being totally controlled to having strong control power), liking (from very disliked to very liked) and familiarity (from very unfamiliar to very familiar).
Procedure. The experimental environment consists of the subject room and the experimenter room. As shown in Fig 9, the EEG signal acquisition experiment was conducted in a quiet, bright subject room. The stimuli were presented by a 24.5-inch displayer with a refresh rate of 165 Hz. The EEG signals were acquired by Emotiv EPOC X, which is a wireless portable EEG collecting device with 14 channels (i.e., AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8 and AF4 in the international 10-20 system), 0.2-43Hz bandwidth and 256Hz sampling frequency. The Emotiv EPOC X and its electrode distribution are shown in Fig 10. The EEG data were recorded by EmotivPRO software. At the same time, we used the MER-502-79U3C industrial camera to capture the face videos and skin electrical sensor to obtain GSR data from left hand of the participant. In order to avoid the interference of electromagnetic devices on the EEG signals, the displayer in the subject room was connected to the host computer in a wired way. The host computer was placed in the experimenter room, and the experimenters could observe the situations of equipment connection and participant status in real time through the displayer in the experimenter room.
Before the start of the formal experiment, the experimenter first introduced the experimental procedure and subjective evaluation scales in detail. And then, two experimenters helped the participant to wear the EEG cap and skin electrical sensor. The participant was reminded to adjust the seat to a comfortable state according to his viewing habits to ensure that he could see the screen clearly. Subsequently, the participant got familiar with the experimental procedure through a practice (one trial). The practice consisted of five steps, including baseline recording, watching the video stimulus, filling in subjective evaluation scales, completing calculation questions and taking a break. During a 15-second baseline recording, the participant was required to remain relaxed, blink as infrequently as possible, and look at a fixation cross "+" on the screen. Then a video stimulus was displayed and the participant was asked to stay as still as possible and blink as infrequently as possible when watching the stimulus. After that, the participant filled in the subjective evaluation scales based on his immediate true feelings. To eliminate the effect of the previous stimulus, the participant was asked to complete two simple calculation questions within ten as a distraction. Next, a 30-second period of rest was taken, during which a blank screen was displayed and the participant was asked to clear his brain of all thoughts, feelings and memories as much as possible. When the participant was ready to start the formal experiment, two experimenters left the subject room.
In order to reduce the influence of human biological rhythms, the experiments were carried out during the daytime. Concretely, the experimental periods were 9:00-11:30 am and 3:00-5:30 pm. Fig 11 shows the flowchart of EEG signal acquisition experiment. When the participant clicked the start button on the screen, he was asked to fill in basic information (including age, gender, profession, education and major), read the experiment description and complete a 5-minute baseline recording with opening or closing eyes alternately every 15 seconds. It can be seen from Fig 11 that, the whole experiment includes two blocks with a total of 27 trails. In the first block, the participant watched a total of 13 video stimuli, consisting of 9 specific short videos (3 positive, 3 neutral and 3 negative) tailored to his age and gender and 4 film clips from the selected 9 Chinese film clips (2 positive, 1 neutral and 1 negative). While in the second block, the participant was required to watch a total of 14 video stimuli, including 9 comparison short videos (3 positive, 3 neutral and 3 negative) designed for analyzing the differences in age and gender and other 5 film clips (1 positive, 2 neutral and 2 negative). Particularly, no more than three video stimuli with the same valence state were shown continuously. To avoid fatigue, there was a short break of about 20-minute between the two blocks. During the experiment, the participant could terminate the experiment immediately if he feels any discomfort. Fig 12 shows a participant shortly before the start of the experiment.
In the EEG signal acquisition experiment, we considered the effects of age and gender differences on emotion elicitation from the following four aspects. For analyzing the effect of gender difference on emotion elicitation, we selected two groups of participants with 20 males and 20 females both aged 20-24. In the aspect of age difference, two groups of participants were chosen, namely 8 males aged 20-24 and 5 males aged 30-34. For analyzing the combined effect of age and gender differences, we picked out two groups of participants with 8 males aged 20-24 and 5 females aged 30-34. For the two groups in the above analysis, the specific short videos of one group were taken as the comparison ones of the other group. For analyzing the effect of specific short videos designed for participants of different ages and genders, two groups of participants were selected, including 8 males and 7 females both aged 25-29. For each group, the comparison short videos were randomly selected from those designed for other groups.
Data processing. The processing of the raw EEG data includes preprocessing and feature extraction. In the preprocessing phase, a bandpass filter was employed to retain the EEG data in the frequency interval of 0.1-50Hz, and a notch filter was also used for denoising with the frequency being 50Hz. Then we manually removed the artifacts to avoid the significant electromyogram and electro-oculogram due to muscle contraction, blinking or eye movement. For the EEG data of each participant, no more than 10 percent of the original data were removed. In the feature extraction phase, the EEG data were sliced into 1-second segments without overlapping, and the differential entropy (DE) features [48] were extracted in five frequency bands, i.e., δ (1-4Hz), θ (4-8Hz), α (8-12Hz), β (13-30Hz) and γ (31-45Hz).

Results
EEG emotion recognition results. In this subsection, we used the EEG signals to perform the emotion recognition task, and then the elicitation effects of emotional stimuli can be evaluated by the recognition accuracy. Since the performance of the classifier is directly related to the accuracy of emotion recognition results, we firstly compared the performance of 11 classifiers that are commonly used in the machine learning methods. In the experiment, the EEG data from all 81 participants were divided into training set and test set in the ratio of 8:2. The input of these classifiers was the DE features of EEG signals, and the output was the three-category results of emotions (positive, neutral and negative). The accuracies of different classifiers for the EEG emotion recognition are given in Fig 13. It can be seen from Fig 13 that, the algorithm of Random Forest achieves the highest accuracy of 86.59% among all 11 classifiers. Therefore, we selected the Random Forest algorithm as the classifier of the EEG emotion recognition task for further study. Table 3 shows the EEG emotion recognition results of different participant groups elicited by different video stimuli. The values in bold indicate the highest accuracy for each group of participants. It can be seen from Table 3 that for both male and female participants aged 20-24, the accuracies of EEG signals elicited by specific short videos are higher than those by comparison ones (male: 96.12% versus 91.22%; female: 87.28% versus 83.87%). This indicates that the gender difference does exist when watching emotional short videos, and thus the specific short videos designed for participants of different genders exhibit better emotion elicitation effects. Similar to the analysis of gender difference, we can observe from Table 3 that the age difference does also exist, so does the combined effect of age and gender differences. For different groups, the accuracies of EEG signals elicited by specific short videos are higher than those by comparison ones. In addition, for both male and female participants aged 25-29, the accuracies of EEG signals elicited by specific short videos are also higher than those by the short  videos randomly selected from those designed for other groups, validating the effectiveness of targeted delivery of specific emotional short videos to different participant groups. What's more, both the specific and comparison short videos bring higher average accuracies than film clips (91.39%/86.37% versus 81.12%), which verifies the reliability of short videos as audiovisual stimuli for eliciting emotions compared with film clips. Subjective evaluation results. In this subsection, we also conducted the statistical analysis on the collected subjective evaluation results. Considering the space constraints, here we take the male and female groups aged 20-24 when studying gender difference as an example, and show the statistical analysis results of their subjective evaluation in Tables 4 and 5. In Table 4, the statistical analysis is performed from the aspects of intensity, hit rate and success index on the different emotional stimuli. For each group of participants, the success indices in bold represent the three highest values for each emotion category. For the positive emotion state, among three video stimuli with the highest success indices for male group, two are from the specific short videos and one is from the comparison ones. The three video stimuli with the highest success indices for female group all come from the specific short videos. In terms of neutral emotion category, the three video stimuli with the highest success indices for both male and female groups all come from the specific short videos. For the negative emotion category, the three video stimuli with the highest success indices for male groups all come from the specific short videos. Among three video stimuli with the highest success indices for female group, two are from the specific short videos and one is from the comparison ones. From the above analysis we can conclude that, of all 18 video stimuli with the highest success indices for three emotion categories, 16 are from the specific short videos and 2 are from the comparison ones. Therefore, it can also be verified that the emotion elicitation effects of short videos are better than those of film clips, and further the targeting delivery of specific emotional short videos makes sense to different participant groups. In addition, Table 4 also shows the comparisons between male and female groups on the intensity and hit rate for each video stimulus. It can be seen from Table 4 that there is the significant gender difference in the intensity of target emotion experienced for 10 short videos. This was due to the participants rating their emotions as more intense when watching the specific short videos than other ones. Moreover for the aspect of hit rate, no significant differences were found between the male and female groups aged 20-24. In addition, Table 5 presents the statistical analysis results on SAM scale of male  and female groups aged 20-24, including valence, arousal, liking, dominance and familiarity. It can be seen from Table 5 that, the significant differences among the specific short videos in terms of the ratings of valence and arousal reflect the successful elicitation of the targeted emotions. Furthermore, we found that the participants generally scored their liking and familiarity with specific short videos higher than comparison short videos and film clips.

Discussion
This study presents the development of a standardized database of Chinese emotional short videos based on age and gender differences, where both the subjective and physiological responses of participants were measured. In this database, a total of 54 short videos were selected that successfully elicited three categories of emotions for different age and gender groups. We conducted two experiments to validate these short videos as emotion elicitation stimuli and compared our database with the existing Chinese film database. The experimental results give the supports for our three hypotheses: 1) the participants of different ages and genders show different emotional responses when watching emotional short videos; 2) the short videos are reliable audio-visual stimuli and exhibit better emotion elicitation effects compared with film clips; 3) the specific short videos designed for participants with different ages and genders have better emotion elicitation effects than those ignoring age and gender differences. The summary of the database contents is given in Table 6. We found the age and gender differences do exist when the participants watched emotional short videos, which was consistent with the previous literatures about individual differences in emotional response. As far as the age difference, the participants of three age groups gave different valence scores for short videos of different emotion categories, especially for disgust, fear, tenderness and amusement as shown in Fig 1. While the age difference in arousal was only presented in the emotion category of disgust. This difference by age factor is similar to those reported in several previous studies [27][28][29]. In fact, the differences in emotional responses brought about by age can be explained from the perspective of cognitive science. Theoretically, the human cognitive system is subject to the changes with age and experience. The conceptual knowledge of emotion at a particular stage is the result of the differentiation of previous conceptual knowledge of emotion by multiple factors. With regard to the gender difference, the female always tends to report stronger emotional responses. This is in line with the results of informed researches that the female reported higher arousal scores and had higher emotional expressivity [30][31][32]. It is reasonable to speculate that this inconsistency stems from the socialization of humans by gender over a long evolutionary period. The males play the role of going out to hunt and protecting their families, and they must be sensitive to threatening stimuli. The females, on the other hand, are tasked with reproducing offspring, and they need to have a greater ability to recognize the emotions of others and to express their own emotions in order to receive more support and assistance [49]. Although either age difference or gender difference have been confirmed by relevant works, no studies have examined age and gender differences in the same emotional stimuli simultaneously. In our database, the age and gender differences of participants are explored together for the first time. The experiment results show that the age and gender differences appear in both valence and arousal dimensions when watching emotional short videos. This phenomenon can be regarded as a justification for providing different emotional stimuli for different participants.
A major contribution of the current work was to design specific short videos for participants of different ages and genders. The existing emotion elicitation databases have been set up to provide the same stimuli for all participants, however, the participants with different ages and genders extremely likely have different emotional responses to the same stimuli. Therefore, the traditional methods of constructing the emotion elicitation database for all participants cannot guarantee the emotion elicitation effects. Different from the previous studies, our database takes age and gender differences into account and provides specific emotional short videos that can induce three emotions (positive, neutral and negative) for the participants of age (20-24, 25-29 and 30-34) and gender (male and female). In terms of objective physiological signals and subjective evaluation, the experimental results have verified the effectiveness of targeted delivering emotional short videos based on individual differences.
Another contribution of this work was the concurrent measurements of subjective and physiological responses. Most of the emotion elicitation databases were established by means of recording and analyzing the subjective evaluation results. Inevitably, limited by individual subjective consciousness, the subjective methods are difficult to objectively reflect the real emotional state of participants. So far as we know, only one Chinese emotion elicitation database had recorded the physiological responses of heart rate and respiration rate [24]. Compared with other physiological signals, the EEG signals directly record the changes in scalp potential, which can reflect the real emotional state of participants more reliably. Pursuing the realistic emotional states of participants, we used the Emotiv EPOC X equipment and Emotiv-PRO software to capture and record the EEG signals of participants while watching video stimuli. The collected EEG signals can be used to objectively evaluate the eliciting effects of emotional stimulus database in conjunction with subjective evaluation. As mentioned above, the current study extends the previous literatures on emotion elicitation database in several ways. However, there are still some limitations worth noting. Firstly in our study, more than half of the participants were aged 20-24, while the numbers of participants aged 25-29 and 30-34 were relatively small. In order to improve the generalizability of the findings, we would expand the sample size of participants aged 25-34 in future work. Secondly, our work was carried out on the participants among the age range of 20-34, the future studies should expand the age range of the database and make it applicable to the wider range of age groups. Thirdly during the experiments, we recorded the EEG signals, GSR and face videos as physiological signals, but only the EEG signals were used to perform the emotion recognition. In the future, we can further analyze multimodal physiological signals such as GSR and face videos, so as to obtain more comprehensive and accurate emotion analysis results. Fourthly, eliciting emotions in a participant depends on a number of factors, such as age, gender, experience of the person, familiarity with the video and so on. We have done a pilot study to explore the age and gender differences in emotional responses when watching short videos, the effects of other factors, such as experience of the person and familiarity with the video, will be further considered in the future work. Last but not the least, our database was built for the Chinese participants, which may induce different types of emotions or emotional intensity for participants from other nationalities. But that does not mean our database cannot be used to elicit emotions for other nationalities, it also allows researchers to compare the emotional responses of participants from different nationalities.

Conclusion
This paper develops a standardized database of Chinese emotional short videos based on age and gender differences. By analyzing the valence and arousal scores of participants in the subjective evaluations, we found that there were indeed age and gender differences when the participants watched emotional short videos. Inspired by this, the specific short videos were also designed for participants of different ages and genders in order to achieve better emotional elicitation effects. As a result, a total of 54 emotional short videos were selected for 6 groups of participants, including the male and female respectively aged in 20-24, 25-29 and 30-34. In particular, the participants in each group were matched with 9 specific short videos, including 3 positive, 3 neutral and 3 negative stimuli. Both the subjective experiences and physiological responses were recorded and analyzed to validate the effectiveness of our emotional short video database.
Supporting information S1 Video. The supporting information of this paper can be downloaded at the following link, https://github.com/EEG-Emotion-Recognition-group/A-standardized-database-of-Chineseemotional-short-videos-based-on-age-and-gender-differences.git.